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Most researchers acknowledge an intrinsic hierarchy in the scholarly journals ("journal 
rank") that they submit their work to, and adjust not only their submission but also 
their reading strategies accordingly. On the other hand, much has been written about 
the negative effects of institutionalizing journal rank as an impact measure. So far, 
contributions to the debate concerning the limitations of journal rank as a scientific impact 
assessment tool have either lacked data, or relied on only a few studies. In this review, 
we present the most recent and pertinent data on the consequences of our current 
scholarly communication system with respect to various measures of scientific quality 
(such as utility/citations, methodological soundness, expert ratings or retractions). These 
data corroborate previous hypotheses: using journal rank as an assessment tool is bad 
scientific practice. Moreover, the data lead us to argue that any journal rank (not only the 
currently-favored Impact Factor) would have this negative impact. Therefore, we suggest 
that abandoning journals altogether, in favor of a library-based scholarly communication 
system, will ultimately be necessary. This new system will use modern information 
technology to vastly improve the filter, sort and discovery functions of the current journal 
system. 
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INTRODUCTION 

Science is the bedrock of modern society, improving our lives 
through advances in medicine, communication, transportation, 
forensics, entertainment and countless other areas. Moreover, 
today's global problems cannot be solved without scientific input 
and understanding. The more our society relies on science, and 
the more our population becomes scientifically literate, the more 
important the reliability [i.e., veracity and integrity, or, "credibil- 
ity" (Ioannidis, 2012)] of scientific research becomes. Scientific 
research is largely a public endeavor, requiring public trust. 
Therefore, it is critical that public trust in science remains high. In 
other words, the reliability of science is not only a societal imper- 
ative, it is also vital to the scientific community itself. However, 
every scientific publication may in principle report results which 
prove to be unreliable, either unintentionally, in the case of hon- 
est error or statistical variability, or intentionally in the case 
of misconduct or fraud. Even under ideal circumstances, sci- 
ence can never provide us with absolute truth. In Karl Popper's 
words: "Science is not a system of certain, or established, state- 
ments" (Popper, 1995). Peer-review is one of the mechanisms 
which have evolved to increase the reliability of the scientific 
literature. 

At the same time, the current publication system is being 
used to structure the careers of the members of the scientific 
community by evaluating their success in obtaining publications 
in high-ranking journals. The hierarchical publication system 
("journal rank") used to communicate scientific results is thus 
central, not only to the composition of the scientific community 



at large (by selecting its members), but also to science's position 
in society. In recent years, the scientific study of the effectiveness 
of such measures of quality control has grown. 

RETRACTIONS AND THE DECLINE EFFECT 

A disturbing trend has recently gained wide public attention: The 
retraction rate of articles published in scientific journals, which 
had remained stable since the 1970's, began to increase rapidly 
in the early 2000's from 0.001% of the total to about 0.02% 
(Figure 1A). In 2010 we have seen the creation and popular- 
ization of a website dedicated to monitoring retractions (http:// 
retractionwatch.com), while 2011 has been described as the "the 
year of the retraction" (Hamilton, 2011). The reasons suggested 
for retractions vary widely, with the recent sharp rise poten- 
tially facilitated by an increased willingness of journals to issue 
retractions, or increased scrutiny and error-detection from online 
media. Although cases of clear scientific misconduct initially con- 
stituted a minority of cases (Nath et al, 2006; Cokol et al., 
2007; Fanelli, 2009; Steen, 2011a; Van Noorden, 2011; Wager and 
Williams, 201 1), the fraction of retractions due to misconduct has 
risen sharper than the overall retraction rate and now the majority 
of all retractions is due to misconduct (Steen, 2011b; Fang et al, 
2012). 

Retraction notices, a metric which is relatively easy to collect, 
only constitute the extreme end of a spectrum of unreliability that 
is inherent to the scientific method: we can hardly ever be entirely 
certain of our results (Popper, 1995). Much of the training scien- 
tists receive aims to reduce this uncertainty long before the work 
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FIGURE 1 | Current trends in the reliability of science. (A) Exponential fit 
for PubMed retraction notices (data from pmretract.heroku.com). (B) 
Relationship between year of publication and individual study effect size. 
Data are taken from Munafo et al. (2007), and represent candidate gene 
studies of the association between DRD2 genotype and alcoholism. The 
effect size (y-axis) represents the individual study effect size (odds ratio; OR), 
on a log-scale. This is plotted against the year of publication of the study 
(x-axis). The size of the circle is proportional to the IF of the journal the 
individual study was published in. Effect size is significantly negatively 
correlated with year of publication. (C) Relationship between IF and extent to 
which an individual study overestimates the likely true effect. Data are taken 



from Munafo et al. (2009), and represent candidate gene studies of a number 
of gene-phenotype associations of psychiatric phenotypes. The bias score 
(y-axis) represents the effect size of the individual study divided by the pooled 
effect size estimated indicated by meta-analysis, on a log-scale. Therefore, a 
value greater than zero indicates that the study provided an over-estimate of 
the likely true effect size. This is plotted against the IF of the journal the study 
was published in (x-axis), on a log-scale. The size of the circle is proportional 
to the sample size of the individual study. Bias score is significantly positively 
correlated with IE sample size significantly negatively. (D) Linear regression 
with confidence intervals between IF and Fang and Casadevall's Retraction 
Index (data provided by Fang and Casadevall, 2011). 



is submitted for publication. However, a less readily quantified 
but more frequent phenomenon (compared to rare retractions) 
has recently garnered attention, which calls into question the 
effectiveness of this training. The "decline-effect," which is now 
well-described, relates to the observation that the strength of evi- 
dence for a particular finding often declines over time (Simmons 
et al, 1999; Palmer, 2000; Moller and Jennions, 2001; Ioannidis, 
2005b; Moller et al., 2005; Fanelli, 2010; Lehrer, 2010; Schooler, 
2011; Simmons et al, 2011; Van Dongen, 2011; Bertamini and 
Munafo, 2012; Gonon et al, 2012). This effect provides wider 
scope for assessing the unreliability of scientific research than 
retractions alone, and allows for more general conclusions to be 
drawn. 

Researchers make choices about data collection and analy- 
sis which increase the chance of false-positives (i.e., researcher 
bias) (Simmons et al, 1999; Simmons et al., 2011), and sur- 
prising and novel effects are more likely to be published than 



studies showing no effect. This is the well-known phenomenon 
of publication bias (Song et al., 1999; Moller and Jennions, 2001; 
Callaham, 2002; Moller et al, 2005; Munafo et al, 2007; Dwan 
et al., 2008; Young et al, 2008; Schooler, 2011; Van Dongen, 

2011) . In other words, the probability of getting a paper pub- 
lished might be biased toward larger initial effect sizes, which 
are revealed by later studies to be not so large (or even absent 
entirely), leading to the decline effect. While sound methodology 
can help reduce researcher bias (Simmons et al, 1999), publica- 
tion bias is more difficult to address. Some journals are devoted 
to publishing null results, or have sections devoted to these, but 
coverage is uneven across disciplines and often these are not par- 
ticularly high-ranking or well-read (Schooler, 2011; Nosek et al., 

2012) . Publication therein is typically not a cause for excitement 
(Giner-Sorolla, 2012; Nosek et al., 2012), leading to an overall 
low frequency of replication studies in many fields (Kelly, 2006; 
Carpenter, 2012; Hartshorne and Schachner, 2012; Makel et al., 



Frontiers in Human Neuroscience 



www.frontiersin.org 



June 2013 | Volume 7 | Article 291 | 2 



Brembs et al. 



Consequences of journal rank 



2012; Yong, 2012). Publication bias is also exacerbated by a ten- 
dency for journals to be less likely to publish replication studies 
(or, worse still, failures to replicate) (Curry, 2009; Goldacre, 201 1; 
Sutton, 2011; Editorial, 2012; Hartshorne and Schachner, 2012; 
Yong, 2012). Here we argue that the counter-measures proposed 
to improve the reliability and veracity of science such as peer- 
review in a hierarchy of journals or methodological training of 
scientists may not be sufficient. 

While there is growing concern regarding the increasing rate of 
retractions in particular, and the unreliability of scientific findings 
in general, little consideration has been given to the infrastruc- 
ture by which scientists not only communicate their findings but 
also evaluate each other as a potential contributing factor. That 
is, to what extent does the environment in which science takes 
place contribute to the problems described above? By far the most 
common metric by which publications are evaluated, at least ini- 
tially, is the perceived prestige or rank of the journal in which they 
appear. Does the pressure to publish in prestigious, high-ranking 
journals contribute to the unreliability of science? 

THE DECLINE EFFECT AND JOURNAL RANK 

The common pattern seen where the decline effect has been 
documented is one of an initial publication in a high-ranking 
journal, followed by attempts at replication in lower-ranked jour- 
nals which either failed to replicate the original findings, or 
suggested a much weaker effect (Lehrer, 2010). Journal rank is 
most commonly assessed using Thomson Reuters' Impact Factor 
(IF), which has been shown to correspond well with subjective 
ratings of journal quality and rank (Gordon, 1982; Saha et al, 
2003; Yue et al., 2007; Sonderstrup-Andersen and Sonderstrup- 
Andersen, 2008). One particular case (Munafo et al, 2007) 
illustrates the decline effect (Figure IB), and shows that early 
publications both report a larger effect than subsequent stud- 
ies, and are also published in journals with a higher IF. These 
observations raise the more general question of whether research 
published in high-ranking journals is inherently less reliable than 
research in lower- ranking journals. 

As journal rank is also predictive of the incidence of fraud 
and misconduct in retracted publications, as opposed to other 
reasons for retraction (Steen, 2011a), it is not surprising that 
higher ranking journals are also more likely to publish fraudu- 
lent work than lower ranking journals (Fang et al., 2012). These 
data, however, cover only the small fraction of publications that 
have been retracted. More important is the large body of the lit- 
erature that is not retracted and thus actively being used by the 
scientific community. There is evidence that unreliability is higher 
in high-ranking journals as well, also for non-retracted publi- 
cations: A meta-analysis of genetic association studies provides 
evidence that the extent to which a study over-estimates the likely 
true effect size is positively correlated with the IF of the journal in 
which it is published (Figure 1C) (Munafo et al., 2009). Similar 
effects have been reported in the context of other research fields 
(Ioannidis, 2005a; Ioannidis and Panagiotou, 2011; Siontis et al., 
2011). 

There are additional measures of scientific quality and in none 
does journal rank fare much better. A study in crystallogra- 
phy reports that the quality of the protein structures described 



is significantly lower in publications in high-ranking journals 
(Brown and Ramaswamy, 2007). Adherence to basic principles 
of sound scientific (e.g., the CONSORT guidelines: http://www. 
consort-statement.org), or statistical methodology have also been 
tested. Four different studies on levels of evidence in medical 
and/or psychological research have found varying results. While 
two studies on surgery journals found a correlation between 
IF and the levels of evidence defined in the respective stud- 
ies (Obremskey et al., 2005; Lau and Samman, 2007), a study 
of anesthesia journals failed to find any statistically significant 
correlation between journal rank and evidence-based medicine 
principles (Bain and Myles, 2005) and a study of seven med- 
ical/psychological journals found highly varying adherence to 
statistical guidelines, irrespective of journal rank (Tressoldi et al, 
2013). The two surgery studies covered an IF range between 
0.5 and 2.0, and 0.7 and 1.2, respectively, while the anesthesia 
study covered the range 0.8-3.5. It is possible that any corre- 
lation at the lower end of the scale is abolished when higher 
rank journals are included. The study by Tressoldi and colleagues, 
which included very high ranking journals, supports this inter- 
pretation. Importantly, if publications in higher ranking journals 
were methodologically sounder, then one would expect the oppo- 
site result: inclusion of high-ranking journals should result in a 
stronger, not a weaker correlation. Further supporting the notion 
that journal rank is a poor predictor of statistical soundness is 
our own analysis of data on statistical power in neuroscience 
studies (Button et al, 2013). There was no significant correlation 
between statistical power and journal rank (N = 650;r s = —0.01; 
t = 0.8; Figure 2). Thus, the currently available data seem to 
indicate that journal rank is a poor indicator of methodological 
soundness. 

Beyond explicit quality metrics and sound methodology, 
reproducibility is at the core of the scientific method and thus 
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FIGURE 2 | No association between statistical power and journal IF. 

The statistical power of 650 neuroscience studies (data from Button et al., 
2013; 19 missing ref; 3 unclear reporting; 57 published in journal without 
2011 IF; 1 book) plotted as a function of the 2011 IF of the publishing journal. 
The studies were selected from the 730 contributing to the meta-analyses 
included in Button et al. (2013), Table 1, and included where journal title and 
IF (2011 © Thomson Reuters Journal Citation Reports) were available. 
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a hallmark of scientific quality. Three recent studies reported 
attempts to replicate published findings in preclinical medicine 
(Scott et al., 2008; Prinz et al, 2011; Begley and Ellis, 2012). 
All three found a very low frequency of replication, suggesting 
that maybe only one out of five preclinical findings is repro- 
ducible. In fact, the level of reproducibility was so low that 
no relationship between journal rank and reproducibility could 
be detected. Hence, these data support the necessity of recent 
efforts such as the "Reproducibility Initiative" (Baker, 2012) or 
the "Reproducibility Project" (Collaboration, 2012). In fact, the 
data also indicate that these projects may consider starting with 
replicating findings published in high-ranking journals. 

Given all of the above evidence, it is therefore not surpris- 
ing that journal rank is also a strong predictor of the rate of 
retractions (Figure ID) (Liu, 2006; Cokol et al., 2007; Fang and 
Casadevall, 2011). 

SOCIAL PRESSURE AND JOURNAL RANK 

There are thus several converging lines of evidence which indi- 
cate that publications in high ranking journals are not only more 
likely to be fraudulent than articles in lower ranking journals, 
but also more likely to present discoveries which are less reli- 
able (i.e., are inflated, or cannot subsequently be replicated). 
Some of the sociological mechanisms behind these correlations 
have been documented, such as pressure to publish (preferably 
positive results in high-ranking journals), leading to the poten- 
tial for decreased ethical standards (Anderson et al., 2007) and 
increased publication bias in highly competitive fields (Fanelli, 
2010). The general increase in competitiveness, and the precar- 
iousness of scientific careers (Shapin, 2008), may also lead to an 
increased publication bias across the sciences (Fanelli, 201 1). This 
evidence supports earlier propositions about social pressure being 
a major factor driving misconduct and publication bias (Giles, 
2007), eventually culminating in retractions in the most extreme 
cases. 

That being said, it is clear that the correlation between journal 
rank and retraction rate is likely too strong (coefficient of deter- 
mination of 0.77; data from Fang and Casadevall, 2011) to be 
explained exclusively by the decreased reliability of the research 
published in high ranking journals. Probably, additional factors 
contribute to this effect. For instance, one such factor may be 
the greater visibility of publications in these journals, which is 
both one of the incentives driving publication bias, and a likely 
underlying cause for the detection of error or misconduct with 
the eventual retraction of the publications as a result (Cokol 
et al, 2007). Conversely, the scientific community may also be less 
concerned about incorrect findings published in more obscure 
journals. With respect to the latter, the finding that the large 
majority of retractions come from the numerous lower-ranking 
journals (Fang et al., 2012) reveals that publications in lower 
ranking journals are scrutinized and, if warranted, retracted. 
Thus, differences in scrutiny are likely to be only a contributing 
factor and not an exclusive explanation, either. With respect to the 
former, visibility effects in general can be quantified by measur- 
ing citation rates between journals, testing the assumption that if 
higher visibility were a contributing factor to retractions, it must 
also contribute to citations. 



JOURNAL RANK AND STUDY IMPACT 

Thus far we have presented evidence that research published in 
high-ranking journals may be less reliable compared with publi- 
cations in lower-ranking journals. Nevertheless, there is a strong 
common perception that high-ranking journals publish "better" 
or "more important" science, and that the IF captures this well 
(Gordon, 1982; Saha et al., 2003). The assumption is that high- 
ranking journals are able to be highly selective and publish only 
the most important, novel and best-supported scientific discov- 
eries, which will then, as a consequence of their quality, go on 
to be highly cited (Young et al., 2008). One way to reconcile this 
common perception with the data would be that, while journal 
rank may be indicative of a minority of unreliable publications, 
it may also (or more strongly) be indicative of the importance of 
the majority of remaining, reliable publications. Indeed, a recent 
study on clinical trial meta-analyses found that a measure for the 
novelty of a clinical trial's main outcome did correlate signifi- 
cantly with journal rank (Evangelou et al., 2012). Compared to 
this relatively weak correlation (with all coefficients of determi- 
nation lower than 0.1), a stronger correlation was reported for 
journal rank and expert ratings of importance (Allen et al., 2009). 
In this study, the journal in which the study had appeared was 
not masked, thus not excluding the strong correlation between 
subjective journal rank and journal quality as a confounding fac- 
tor. Nevertheless, there is converging evidence from two studies 
that journal rank is indeed indicative of a publication's perceived 
importance. 

Beyond the importance or novelty of the research, there are 
three additional reasons why publications in high-ranking jour- 
nals might receive a high number of citations. First, publications 
in high-ranking journals achieve greater exposure by virtue not 
only of the larger circulation of the journal in which they appear, 
but also of the more prominent media attention (Gonon et al., 
2012). Second, citing high-ranking publications in one's own 
publication may increase its perceived value. Third, the novel, 
surprising, counter-intuitive or controversial findings often pub- 
lished in high-ranking journals, draw citations not only from 
follow-up studies but also from news-type articles in scholarly 
journals reporting and discussing the discovery. Despite these 
four factors, which would suggest considerable effects of journal 
rank on future citations, it has been established for some time that 
the actual effect of journal rank is measurable, but nowhere near 
as substantial as indicated (Seglen, 1994, 1997; Callaham, 2002; 
Chow et al., 2007; Kravitz and Baker, 2011; Hegarty and Walton, 
2012; Finardi, 2013) and as one would expect if visibility were the 
exclusive factor driving retractions. In fact, the average effect sizes 
roughly approach those for journal rank and unreliability, cited 
above. 

The data presented in a recent analysis of the development 
of these correlations between IF-based journal rank and future 
citations over the period from 1902-2009 (with IFs before the 
1960's computed retroactively) reveal two very informative trends 
(Figure 3, data from (Lozano et al., 2012). First, while the predic- 
tive power of journal rank remained very low for the entire first 
two thirds of the twentieth century, it started to slowly increase 
shortly after the publication of the first IF data in the 1960's. 
This correlation kept increasing until the second interesting trend 



Frontiers in Human Neuroscience 



www.frontiersin.org 



June 2013 | Volume 7 | Article 291 | 4 



Brembs et al. 



Consequences of journal rank 




1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 



FIGURE 3 | Trends in predicting citations from journal rank. The 

coefficient of determination (R 2 ) between journal rank (as measured by IF) 
and the citations accruing over 2 years after publications is plotted as a 
function of publication year in a sample of almost 30 million publications. 
Lozano et al. (2012) make the case that one can explain the trends in the 
predictive value of journal rank by the publication of the IF in the 1960's (Fi 2 
increase is accelerating) and the widespread adoption of internet searches 
in the 1990's (R 2 is dropping). The data support the interpretation that 
reading habits drive the correlation between journal rank and citations more 
than any inherent quality of the articles. IFs before the invention of the IF 
have been retroactively computed for the years before the 1960's. 



emerged with the advent of the internet and keyword-search 
engines in the 1990's, from which time on it fell back to pre- 
1960's levels until the end of the study period in 2009. Overall, 
consistent with the citation data already available, the coefficient 
of determination between journal rank and citations was always 
in the range of ~0.1 to 0.3 (i.e., quite low). It thus appears that 
indeed a small but significant correlation between journal rank 
and future citations can be observed. Moreover, the data sug- 
gest that most of this small effect stems from visibility effects 
due to the influence of the IF on reading habits (Lozano et al., 
2012), rather than from factors intrinsic to the published articles 
(see data cited above). However, the correlation is so weak that it 
cannot alone account for the strong correlation between retrac- 
tions and journal rank, but instead requires additional factors, 
such as the increased unreliability of publications in high ranking 
journals cited above. Supporting these weak correlations between 
journal rank and future citations are data reporting classification 
errors (i.e., whether a publication received too many or too few 
citations with regard to the rank of the journal it was published 
in) at or exceeding 30% (Starbuck, 2005; Chow et al., 2007; Singh 
et al., 2007; Kravitz and Baker, 2011). In fact, these classification 
errors, in conjunction with the weak citation advantage, render 
journal rank practically useless as an evaluation signal, even if 
there was no indication of less reliable science being published 
in high ranking journals. 

The only measure of citation count that does correlate strongly 
with journal rank (negatively) is the number of articles without 
any citations at all (Weale et al., 2004), supporting the argu- 
ment that fewer articles in high-ranking journals go unread. Thus, 
there is quite extensive evidence arguing for the strong correlation 



between journal rank and retraction rate to be mainly due to 
two factors: there is direct evidence that the social pressures to 
publish in high ranking journals increases the unreliability, inten- 
tional or not, of the research published there. There is more 
indirect evidence, derived mainly from citation data, indicating 
that increased visibility of publications in high ranking journals 
may potentially contribute to increased error-detection in these 
journals. With several independent measures failing to provide 
compelling evidence that journal rank is a reliable predictor of 
scientific impact or quality, and other measures indicating that 
journal rank is at least equally if not more predictive of low relia- 
bility, the central role of journal rank in modern science deserves 
close scrutiny. 

PRACTICAL CONSEQUENCES OF JOURNAL RANK 

Even if a particular study has been performed to the highest stan- 
dards, the quest for publication in high-ranking journals slows 
down the dissemination of science and increases the burden 
on reviewers, by iterations of submissions and rejections cas- 
cading down the hierarchy of journal rank (Statzner and Resh, 
2010; Kravitz and Baker, 2011; Nosek and Bar-Anan, 2012). A 
recent study seems to suggest that such rejections eventually 
improve manuscripts enough to yield measurable citation ben- 
efits (Calcagno et al, 2012). However, the effect size of such 
resubmissions appears to be of the order of 0.1 citations per 
article, a statistically significant but, in practical terms, negligi- 
ble effect. This conclusion is corroborated by an earlier study 
which failed to find any such effect (Nosek and Bar-Anan, 
2012). Moreover, with peer-review costs estimated in excess 
of 2.2 billion € (US$ ~2.8b) annually (Research Information 
Network, 2008), the resubmission cascade contributes to the 
already rising costs of journal rank: the focus on journal rank 
has allowed corporate publishers to keep their most presti- 
gious journals closed-access and to increase subscription prices 
(Kyrillidou et al., 2012), creating additional barriers to the dis- 
semination of science. The argument from highly selective jour- 
nals is that their per-article cost would be too high for author 
processing fees, which may be up to 37,000€ (US$ 48,000) 
for the journal Nature (House of Commons, 2004). There is 
also evidence from one study in economics suggesting that 
journal rank can contribute to suppression of interdisciplinary 
research (Rafols et al, 2012), keeping disciplines separate and 
isolated. 

Finally, the attention given to publication in high-ranking 
journals may distort the communication of scientific progress, 
both inside and outside of the scientific community. For instance, 
the recent discovery of a "Default-Mode Network" in rodent 
brains was, presumably, made independently by two different 
sets of neuroscientists and published only a few months apart 
(Upadhyay et al., 2011; Lu et al, 2012). The later, but not the 
earlier, publication (Lu et al., 2012) was cited in a subsequent 
high-ranking publication (Welberg, 2012). Despite both studies 
largely reporting identical findings (albeit, perhaps, with differ- 
ent quality), the later report has garnered 19 citations, while the 
earlier one only 5, at the time of this writing. We do not know 
of any empirical studies quantitatively addressing this particu- 
lar effect of journal rank. However, a similar distortion due to 
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selective attention to publications in high-ranking journals has 
been reported in a study on medical research. This study found 
media reporting to be distorted, such that once initial findings 
in higher-ranking journals have been refuted by publications in 
lower ranking journals (a case of decline effect), they do not 
receive adequate media coverage (Gonon et al, 2012). 

IMPACT FACTOR— NEGOTIATED, IRREPRODUCIBLE, AND 
UNSOUND 

The IF is a metric for the number of citations to articles in a jour- 
nal (the numerator), normalized by the number of articles in that 
journal (the denominator). However, there is evidence that IF is, 
at least in some cases, not calculated but negotiated, that it is not 
reproducible, and that, even if it were reproducibly computed, 
the way it is derived is not mathematically sound. The fact that 
publishers have the option to negotiate how their IF is calculated 
is well-established — in the case of PLoS Medicine, the negotia- 
tion range was between 2 and about 11 (Editorial, 2006). What 
is negotiated is the denominator in the IF equation (i.e., which 
published articles which are counted), given that all citations 
count toward the numerator whether they result from publica- 
tions included in the denominator or not. It has thus been public 
knowledge for quite some time now that removing editorials and 
News-and-Views articles from the denominator (so called "front- 
matter") can dramatically alter the resulting IF (Moed and Van 
Leeuwen, 1995, 1996; Baylis et al., 1999; Garfield, 1999; Adam, 
2002; Editorial, 2005; Hernan, 2009). While these IF negotiations 
are rarely made public, the number of citations (numerator) and 
published articles (denominator) used to calculate IF are accessi- 
ble via Journal Citation Reports. This database can be searched for 
evidence that the IF has been negotiated. For instance, the numer- 
ator and denominator values for Current Biology in 2002 and 
2003 indicate that while the number of citations remained rela- 
tively constant, the number of published articles dropped. This 
decrease occurred after the journal was purchased by Cell Press 
(an imprint of Elsevier), despite there being no change in the 
layout of the journal. Critically, the arrival of a new publisher 
corresponded with a retrospective change in the denominator 



Table 1 | Thomson Reuters' IF calculations for the journal "Current 
Biology" in the years 2002/2003. 



o 
o 
o 



Journal: current biology Q- 



o 
o 

CM 



3 
Q. 



CM 
O 

o 



SI 
3 

a. 



si 

3 

a. 



Ul 

c 

T3 



.2 " 



JCR science edition 2002 504 528 n.c. 1032 7231 7.007 
JCR science edition 2003 n.c. 300 334 634 7551 11.910 

Most of the rise in IF is due to the reduction in published items. 

Note the discrepancy between the number of items published in 2001 between 

the two consecutive JCR Science Editions. 

n.c: year not covered by this edition. Raw data see Figure A 1 



used to calculate IF (Table 1). Similar procedures raised the IF 
of FASEB Journal from 0.24 in 1988 to 18.3 in 1989, when con- 
ference abstracts ceased to count toward the denominator (Baylis 
et al., 1999). 

In an attempt to test the accuracy of the ranking of some 
of their journals by IF, Rockefeller University Press purchased 
access to the citation data of their journals and some competi- 
tors. They found numerous discrepancies between the data they 
received and the published rankings, sometimes leading to differ- 
ences of up to 19% (Rossner et al., 2007). When asked to explain 
this discrepancy, Thomson Reuters replied that they routinely use 
several different databases and had accidentally sent Rockefeller 
University Press the wrong one. Despite this, a second database 
sent also did not match the published records. This is only one 
of a number reported errors and inconsistencies (Reedijk, 1998; 
Moed etal, 1996). 

It is well-known that citation data are strongly left-skewed, 
meaning that a small number of publications receive a large num- 
ber of citations, while most publications receive very few (Seglen, 
1992, 1997; Weale et al, 2004; Editorial, 2005; Chow et al, 2007; 
Rossner et al, 2007; Taylor et al, 2008; Kravitz and Baker, 2011). 
The use of an arithmetic mean as a measure of central tendency 
on such data (rather than, say, the median) is clearly inappro- 
priate, but this is exactly what is used in the IF calculation. The 
International Mathematical Union reached the same conclusion 
in an analysis of the IF (Adler et al., 2008). A recent study corre- 
lated the median citation frequency in a sample of 100 journals 
with their 2-year IF and found a very strong correlation, which 
is expected due to the similarly left-skewed distributions in most 
journals (Editorial, 2013). However, at the time of this writing, it 
is not known if using the median (instead of the mean) improves 
any of the predominantly weak predictive properties of journal 
rank. Complementing the specific flaws just mentioned, a recent, 
comprehensive review of the bibliometric literature lists vari- 
ous additional shortcomings of the IF more generally (Vanclay, 
2011). 

CONCLUSIONS 

While at this point it seems impossible to quantify the relative 
contributions of the different factors influencing the reliability 
of scientific publications, the current empirical literature on the 
effects of journal rank provides evidence supporting the following 
four conclusions: (1) journal rank is a weak to moderate pre- 
dictor of utility and perceived importance; (2) journal rank is a 
moderate to strong predictor of both intentional and uninten- 
tional scientific unreliability; (3) journal rank is expensive, delays 
science and frustrates researchers; and, (4) journal rank as estab- 
lished by IF violates even the most basic scientific standards, but 
predicts subjective judgments of journal quality. 

CAVEATS 

While our latter two conclusions appear uncontroversial, the for- 
mer two are counter-intuitive and require explanation. Weak 
correlations between future citations and journal rank based on 
IF may be caused by the poor statistical properties of the IF. 
This explanation could (and should) be tested by using any of 
the existing alternative ranking tools available (such as Thomson 
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Reuters' Eigenfactor, Scopus' SCImagoJournalRank, or Google's 
Scholar Metrics etc.) and computing correlations with the met- 
rics discussed above. However, a recent analysis shows a high 
correlation between these ranks, so no large differences would be 
expected (Lopez-Cozar and Cabezas-Clavijo, 2013). Alternatively, 
one can choose other important metrics and compute which jour- 
nals score particularly high on these. Either way, since the IF 
reflects the common perception of journal hierarchies rather well 
(Gordon, 1982; Saha et al, 2003; Yue et al, 2007; Sonderstrup- 
Andersen and Sonderstrup-Andersen, 2008), any alternative hier- 
archy that would better reflect article citation frequencies might 
violate this intuitive sense of journal rank, as different ways 
to compute journal rank lead to different hierarchies (Wagner, 
2011). Both alternatives thus challenge our subjective journal 
ranking. To put it more bluntly, if perceived importance and util- 
ity were to be discounted as indirect proxies of quality, while 
retraction rate, replicability, effect size overestimation, correct 
sample sizes, crystallographic quality, sound methodology and so 
on counted as more direct measures of quality, then inversing the 
current IF-based journal hierarchy would improve the alignment 
of journal rank for most and have no effect on the rest of these 
more direct measures of quality. 

The subjective journal hierarchy also leads to a circularity that 
confounds many empirical studies. That is, authors use jour- 
nal rank, in part, to make decisions of where to submit their 
manuscripts, such that well-performed studies yielding ground- 
breaking discoveries with general implications are preferentially 
submitted to high-ranking journals. Readers, in turn, expect only 
to read about such articles in high-ranking journals, leading to the 
exposure and visibility confounds discussed above and at length 
in the cited literature. Moreover, citation practices and method- 
ological standards vary in different scientific fields, potentially 
distorting both the citation and reliability data. Given these con- 
founds one might expect highly varying and often inconclusive 
results. Despite this, the literature contains evidence for associ- 
ations between journal rank and measures of scientific impact 
(e.g., citations, importance, and unread articles), but also con- 
tains at least equally strong, consistent effects of journal rank 
predicting scientific unreliability (e.g., retractions, effect size, 
sample size, replicability, fraud/misconduct, and methodology). 
Neither group of studies can thus be easily dismissed, suggesting 
that the incentives journal rank creates for the scientific commu- 
nity (to submit either their best or their most unreliable work 
to the most high-ranking journals) at best cancel each other out. 
Such unintended consequences are well-known from other fields 
where metrics are applied (Hauser and Katz, 1998). 

Therefore, while there are concerns not only about the validity 
of the IF as the metric of choice for establishing journal rank but 
also about confounding factors complicating the interpretation 
of some of the data, we find, in the absence of additional data, 
that these concerns do not suffice to substantially question our 
conclusions, but do emphasize the need for future research. 

POTENTIAL LONG-TERM CONSEQUENCES OF JOURNAL 
RANK 

Taken together, the reviewed literature suggests that using jour- 
nal rank is unhelpful at best and unscientific at worst. In our 



view, IF generates an illusion of exclusivity and prestige based 
on an assumption that it predicts scientific quality, which is not 
supported by empirical data. As the IF aligns well with intuitive 
notions of journal hierarchies (Gordon, 1982; Saha et al., 2003; 
Yue et al., 2007), it receives insufficient scrutiny (Frank, 2003) 
(perhaps a case of confirmation bias). The one field in which 
journal rank is scrutinized is bibliometrics. We have reviewed the 
pertinent empirical literature to supplement the largely argumen- 
tative discussion on the opinion pages of many learned journals 
(Moed and Van Leeuwen, 1996; Lawrence, 2002, 2007, 2008; 
Bauer, 2004; Editorial, 2005; Giles, 2007; Taylor et al., 2008; 
Todd and Ladle, 2008; Tsikliras, 2008; Adler and Harzing, 2009; 
Garwood, 201 1; Schooler, 2011; Brumback, 2012; Sarewitz, 2012) 
with empirical data. Much like dowsing, homeopathy or astrol- 
ogy, journal rank seems to appeal to subjective impressions of 
certain effects, but these effects disappear as soon as they are 
subjected to scientific scrutiny. 

In our understanding of the data, the social and psychological 
influences described above are, at least to some extent, gener- 
ated by journal rank itself, which in turn may contribute to the 
observed decline effect and rise in retraction rate. That is, systemic 
pressures on the author, rather than increased scrutiny on the part 
of the reader, inflate the unreliability of much scientific research. 
Without reform of our publication system, the incentives associ- 
ated with increased pressure to publish in high-ranking journals 
will continue to encourage scientists to be less cautious in their 
conclusions (or worse), in an attempt to market their research 
to the top journals (Anderson et al, 2007; Giles, 2007; Shapin, 
2008; Munafo et al, 2009; Fanelli, 2010). This is reflected in the 
decline in null results reported across disciplines and countries 
(Fanelli, 2011), and corroborated by the findings that much of the 
increase in retractions may be due to misconduct (Steen, 2011b; 
Fang et al., 2012), and that much of this misconduct occurs in 
studies published high-ranking journals (Steen, 201 la; Fang et al, 
2012). Inasmuch as journal rank guides the appointment and 
promotion policies of research institutions, the increasing rate of 
misconduct that has recently been observed may prove to be but 
the beginning of a pandemic: It is conceivable that, for the last 
few decades, research institutions world-wide may have been hir- 
ing and promoting scientists who excel at marketing their work to 
top journals, but who are not necessarily equally good at conduct- 
ing their research. Conversely, these institutions may have purged 
excellent scientists from their ranks, whose marketing skills did 
not meet institutional requirements. If this interpretation of the 
data is correct, a generation of excellent marketers (possibly, but 
not necessarily, also excellent scientists) now serve as the leading 
figures and role models of the scientific enterprise, constitut- 
ing another potentially major contributing factor to the rise in 
retractions. 

The implications of the data presented here go beyond the 
reliability of scientific publications — public trust in science and 
scientists has been in decline for some time in many coun- 
tries (Nowotny, 2005; European Commission, 2010; Gauchat, 
2010), dramatically so in some sections of society (Gauchat, 
2012), culminating in the sentiment that scientists are noth- 
ing more than yet another special interest group (Miller, 2012; 
Sarewitz, 2013). In the words of Daniel Sarewitz: "Nothing will 
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corrode public trust more than a creeping awareness that sci- 
entists are unable to live up to the standards that they have 
set for themselves" (Sarewitz, 2012). The data presented here 
prompt the suspicion that the corrosion has already begun and 
that journal rank may have played a part in this decline as 
well. 

ALTERNATIVES 

Alternatives to journal rank exist — we now have technology at 
our disposal which allows us to perform all of the functions 
journal rank is currently supposed to perform in an unbiased, 
dynamic way on a per-article basis, allowing the research com- 
munity greater control over selection, filtering, and ranking of 
scientific information (Honekopp and Khan, 2011; Kravitz and 
Baker, 2011; Lin, 2012; Priem et al., 2012; Roemer and Borchardt, 
2012; Priem, 2013). Since there is no technological reason to con- 
tinue using journal rank, one implication of the data reviewed 
here is that we can instead use current technology and remove 
the need for a journal hierarchy completely. As we have argued, it 
is not only technically obsolete, but also counter-productive and 
a potential threat to the scientific endeavor. We therefore would 
favor bringing scholarly communication back to the research 
institutions in an archival publication system in which both soft- 
ware, raw data and their text descriptions are archived and made 
accessible, after peer-review and with scientifically-tested metrics 
accruing reputation in a constantly improving reputation system 
(Eve, 2012). This reputation system would be subjected to the 
same standards of scientific scrutiny as are commonly applied to 
all scientific matters and evolve to minimize gaming and maxi- 
mize the alignment of researchers' interests with those of science 
[which are currently misaligned (Nosek et al, 2012)]. Only an 
elaborate ecosystem of a multitude of metrics can provide the 
flexibility to capitalize on the small fraction of the multi-faceted 
scientific output that is actually quantifiable. Such an ecosystem 
would evolve such that the only evolutionary stable strategy is to 
try and do the best science one can. 

The currently balkanized literature, with a lack of interoper- 
ability and standards as one of its many detrimental, unintended 
consequences, prevents the kind of innovation that gave rise 
to the discover functions of Amazon or eBay, the social net- 
working functions of Facebook or Reddit and course the sort 
and search functions of Google — all technologies virtually every 
scientist uses regularly for all activities but science. Thus, frag- 
mentation and the resulting lack of access and interoperability 
are among the main underlying reasons why journal rank has not 
yet been replaced by more scientific evaluation options, despite 
widespread access to article-level metrics today. With an openly 
accessible scholarly literature standardized for interoperability, 
it would of course still be possible to pay professional editors 
to select publications, as is the case now, but after publication. 
These editors would then actually compete with each other for 
paying customers, accumulating track records for selecting (or 
missing) the most important discoveries. Likewise, virtually any 
functionality the current system offers would easily be replicable 
in the system we envisage. However, above and beyond replicating 
current functionality, an open, standardized scholarly literature 
would place any and all thinkable scientific metrics only a few 



lines of code away, offering the possibility of a truly open eval- 
uation system where any hypothesis can be tested. Metrics, social 
networks and intelligent software then can provide each individ- 
ual user with regular, customized updates on the most relevant 
research. These updates respond to the behavior of the user 
and learn from and evolve with their preferences. With openly 
accessible, interoperable literature, data and software, agents can 
be developed that independently search for hypotheses in the 
vast knowledge accumulating there. But perhaps most impor- 
tantly, with an openly accessible database of science, innovation 
can thrive, bringing us features and ideas nobody can think of 
today and nobody will ever be capable of imagining, if we do 
not bring the products of our labor back under our own con- 
trol. It was the hypertext transfer protocol (http) standard that 
spurred innovation and made the internet what it is today. What 
is required is the equivalent of http for scholarly literature, data 
and software. 

Funds currently spent on journal subscriptions could easily 
suffice to finance the initial conversion of scholarly communica- 
tion, even if only as long-term savings. One avenue to move in 
this direction may be the recently announced Episcience Project 
(Van Noorden, 2013). Other solutions certainly exist (Bachmann, 
2011; Birukou et al., 2011; Kravitz and Baker, 2011; Kreiman 
and Maunsell, 2011; Zimmermann et al, 2011; Beverungen 
et al, 2012; Florian, 2012; Ghosh et al., 2012; Hartshorne and 
Schachner, 2012; Hunter, 2012; Ietto-Gillies, 2012; Kriegeskorte, 
2012; Kriegeskorte et al., 2012; Lee, 2012; Nosek and Bar-Anan, 
2012; Poschl, 2012; Priem and Hemminger, 2012; Sandewall, 
2012; Walther and Van den Bosch, 2012; Wicherts et al, 
2012; Yarkoni, 2012), but the need for an alternative system 
is clearly pressing (Casadevall and Fang, 2012). Given the data 
we surveyed above, almost anything appears superior to the 
status quo. 
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FIGURE Al | Impact Factor of the journal "Current Biology" in the 
years 2002 (above) and 2003 (below) showing a 40% increase in impact. 

The increase in the IF of the journal "Current Biology" from approx. 7 to 



almost 12 from one edition of Thomson Reuters' "Journal Citation Reports" 
to the next is due to a retrospective adjustment of the number of items 
published (marked), while the actual citations remained relatively constant. 
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